Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added IValue.getPatternMatchFingerprint() that simulates rascal-core:TopLevelType.getFingerprint(v) exactly for the value kinds we have in vallang #210

Merged
merged 12 commits into from
Oct 17, 2023

Conversation

jurgenvinju
Copy link
Member

No description provided.

…TopLevelType.getFingerprint(v) exactly for the value kinds we have in vallang
default int getPatternMatchFingerprint() {
return getName().hashCode() << 2 + arity();

// TODO: this would distinguish more constructors based on incomparable argument types:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost committed this, but I realize this has to be reflected in the switch code generator in the compiler as well. So we need to have a java function that produces the hashCode for a given constructor then. We can push this to the future. We only have to realize that changing these hashCodes breaks all earlier compiled Rascal code, always.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. We already have this at the moment in util::Reflective via the functions getFingerprint (2 versions) and getFingerprintNode. Note that the former has a boolean parameter concretePatterns. In the new compiler set-up a single function getFingerprint(value v) will do. (In the compiler runtime I have a function getConcreteFingerprint that operates on concrete syntax trees.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So leave as is for now and change later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's do that

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

Test Results

         96 files  ±         0           96 suites  ±0   8m 9s ⏱️ -19s
242 283 tests +16 600  242 282 ✔️ +16 600  1 💤 ±0  0 ±0 
726 936 runs  +49 800  726 933 ✔️ +49 800  3 💤 ±0  0 ±0 

Results for commit 8105952. ± Comparison against base commit 65115ed.

♻️ This comment has been updated with latest results.

@jurgenvinju
Copy link
Member Author

The tests I just added fail. So not ready to merge but it's good they fail now :-)

@PaulKlint
Copy link
Member

I find the name getPatternMatchFingerprint a bit verbose. Maybe getMatchFingerprint?

It is also useful to add a function that returns a default value, e.g. getMatchFingerprintDefault.

@PaulKlint PaulKlint marked this pull request as ready for review October 6, 2023 16:32
@jurgenvinju
Copy link
Member Author

What should the default method do exactly? Need more info.

@PaulKlint
Copy link
Member

In the compiler I map patterns like int n to a default value (currently zero) and that is later mapped to the default of the generated switch statement.

The only guarantee that I need is that the default value is never generated for something else.

@DavyLandman
Copy link
Member

DavyLandman commented Oct 14, 2023 via email

default int getMatchFingerprint() {
int hash = getName().hashCode();

return hash == 0 ? 13547528 /* "node".hashCode() << 2*/ + arity() : (hash << 2) + arity();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash << 2 throws away important bits?

@jurgenvinju
Copy link
Member Author

So to summarize the discussion:

  • we only use String.hashCode() currently --in IString, INode, and IConstructor-- which is extremely stable for the same reasons as our own getMatchFingerprint must be stable (constants used in switch cases). I conclude we have no JVM version portability issue here. And I test for this quite extensively in IValueSpecification.
  • Our getMatchFingerprint implementations reuse our hashCode implementations sometimes because they have been optimized for speed and uniformity already. If we decide to change the hashCode methods we can always copy their old implementation to getMatchFingerprint methods. This liability has been encapsulated by introducing getMatchFingerprint in the first place as a separate method.
  • The getMatchFingerprint implementations have all been changed to never return 0 anymore; this is different from the implementation in rascal-core that we ported this from.
  • The fingerprints for INode, IConstructor, and ITuple shift 2 bits of significant and effective randomness away, and I propose to change that by simply adding the arities to the hashCodes of the string names. All we lose is that we can't read the arity from the hashCode anymore like before (in binary form). And we still have different fingerprints for different arities with the same name, which is what this is all about.
  • IConstructors could also hash on the (uninstantiated) constructor type instead of only the name. This could help distinguish operators like and(Bool, int) and and(int, Bool). Given that these kind of distinctions are not that prevalent and computing the hashCode of a ConstructorType may be a lot more expensive than just the name of the constructor, I think we should stick with the latter.

@DavyLandman
Copy link
Member

DavyLandman commented Oct 16, 2023

Good summary.

The fingerprints for INode, IConstructor, and ITuple shift 2 bits of significant and effective randomness away, and I propose to change that by simply adding the arities to the hashCodes of the string names.

Maybe nitpicking, but doesn't adding increase the risk of clearing the lower bits? This is only a concern if it's about uniformity I guess.

@jurgenvinju
Copy link
Member Author

Yes, so multiplying with a small unique prime number would help with that. If we use 17 * arity() for INodes and 23 * arity() for IConstructors, that would also immediately make sure that INodes and IConstructors get a different slot even though they have the same name.

@PaulKlint would that be correct? Or should we make sure node patterns have the same fingerprints as their constructor value counterparts? It may be that a node pattern matches against a constructor value? ... thinking out loud here.

@jurgenvinju
Copy link
Member Author

If constructors should behave as nodes, and vice versa, then I've another test to write.

@jurgenvinju
Copy link
Member Author

Thanks for the discussion @DavyLandman @PaulKlint ; I think we have something to work with now; the next step is the rascal project and adding the right codes to the other types of values.

@jurgenvinju jurgenvinju merged commit 0a90ceb into main Oct 17, 2023
@jurgenvinju jurgenvinju deleted the pattern-match-fingerprint branch October 17, 2023 07:43
Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jurgenvinju I had 2 more cases where I wanted your opinion on.

pom.xml Show resolved Hide resolved
@jurgenvinju
Copy link
Member Author

mmmm... I still have a minor puzzle to solve:

@ParameterizedTest @ArgumentsSource(ValueProvider.class)
    public void testFingerprintEqualArityConstructorsTheSameDoNotChangeTheTest(IConstructor node1, IConstructor node2) {
        if (node1.arity() == node2.arity()) {
            assertEquals(node1.getMatchFingerprint(), node2.getMatchFingerprint());
        }
    }

Why does this succeed?

@jurgenvinju
Copy link
Member Author

Because the random generator always produces x()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants